HARNESS and fault tolerant MPI
نویسندگان
چکیده
Initial versions of MPI were designed to work eciently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to support a dynamic process model would have aected their performance. As current HPC systems increase in size with greater potential levels of individual node failure, the need arises for new fault tolerant systems to be developed. Here we present a new implementation of MPI called fault tolerant MPI (FT-MPI) that allows the semantics and associated modes of failures to be explicitly controlled by an application via a modi®ed MPI API. Given is an overview of the FT-MPI semantics, design, example applications, debugging tools and some performance issues. Also discussed is the experimental HARNESS core (G_HCORE) implementation that FT-MPI is built to operate upon.
منابع مشابه
FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World
Initial versions of MPI were designed to work efficiently on multiprocessors which had very little job control and thus static process models, subsequently forcing them to support dynamic process operations would have effected their performance. As current HPC systems increase in size with higher potential levels of individual node failure, the need rises for new fault tolerant systems to be de...
متن کاملProgress towards Petascale Virtual Machines
Petascale Virtual Machines (PVM) continues to be a popular software package both for creating personal grids and for building adaptable, fault tolerant applications. We will illustrate this by describing a computational biology environment built on top of PVM that is used by researchers around the world. We will then describe or recent progress in building an even more adaptable distributed vir...
متن کاملHARNESS fault tolerant MPI design, usage and performance issues
Initial versions of MPI were designed to work efficiently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to support a dynamic process model suitable for use on clusters or distributed systems would have reduced their performance. As current HPC collaborative applications increase in size and distribution the potential levels of no...
متن کاملFault Tolerant MPI for the HARNESS Meta-computing System
Initial versions of MPI were designed to work efficiently on multiprocessors which had very little job control and thus static process models. Subsequently forcing them to support a dynamic process model suitable for use on clusters or distributed systems would have reduced their performance. As current HPC collaborative applications increase in size and distribution the potential levels of nod...
متن کاملBuilding and using an Fault Tolerant MPI implementation
In this paper we discuss the design and use of a fault tolerant MPI (FT-MPI) that handles process failures in a way beyond that of the original MPI static process model. FT-MPI allows the semantics and associated modes of failures to be explicitly controlled by an application via a modified functionality within the standard MPI 1.2 API. Given is an overview of the FT-MPI semantics, architecture...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Computing
دوره 27 شماره
صفحات -
تاریخ انتشار 2001